CATs: Cost Aggregation Transformers for Visual Correspondence

コスト集約 = Matching taskとやらで重要なプロセスである．

semantic segmentation, object detection，image editingにおいて，意味的に類似した画像間の対応関係を知ることは重要である

位置関係が異なっていても，外見が異なっていても，そのクラスであるところの対応関係を知ることが肝要

古典的なマッチングパイプラインとやらでは，以下の3つのパイプライン

feature extraction

flow estimation

targetの画像からsourceに変形するような各画素$ iに対して決まるcorrespondence field$ F(i)を確立することが目的である

関連研究として，この$ F(i)の決め方は

最適輸送

最適化されたHough matching

4D/6D convolutions

があるが，deformationsに弱いあるいはreceptive fieldsが制限されているために不正確であるとしている．

https://scrapbox.io/files/6552689ee19ff6001b4bf6b4.png